Skip to content

Add finance_excellent_cohort_revenue model#3

Open
zingleai wants to merge 1 commit intomainfrom
dp/d6dea61b
Open

Add finance_excellent_cohort_revenue model#3
zingleai wants to merge 1 commit intomainfrom
dp/d6dea61b

Conversation

@zingleai
Copy link
Copy Markdown
Owner

@zingleai zingleai commented Feb 23, 2026

Test PR from data portal

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced product rating tier classification to organize products by quality ratings.
    • Added financial analytics for Excellent-rated products, including cohort-level revenue totals and individual product revenue share metrics.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

The pull request introduces a new dbt finance model that analyzes revenue metrics for products with an "Excellent" rating tier (rating_tier = 'Excellent'). It updates the core dimension schema with data type specifications and new column, then adds corresponding comprehensive schema documentation with governance controls and data quality tests for the new cohort model.

Changes

Cohort / File(s) Summary
Core Dimension Schema Updates
dbt/models/marts/core/schema.yml
Added explicit varchar data type to product_id column and introduced new rating_tier column with description and data type specification.
Finance Excellent Cohort Model
dbt/models/marts/finance/finance_excellent_cohort_revenue.sql, dbt/models/marts/finance/schema.yml
Created new dbt model that filters products by Excellent rating tier and aggregates cohort-level revenue metrics, including per-product percentage share of cohort revenue. Added comprehensive schema documentation with governance metadata, access controls, materialization config, and data quality tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A cohort fine, with ratings supreme,
Excellence shines in our finance theme,
With revenue tallied and percentages true,
The Excellent products shine bright and new! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Add finance_excellent_cohort_revenue model' directly and clearly describes the main change: introducing a new dbt finance model. It is concise, specific, and accurately reflects the primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dp/d6dea61b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dbt/models/marts/finance/finance_excellent_cohort_revenue.sql (1)

41-47: Optional: eliminate the duplicate window computation via an extra CTE layer.

sum(product_total_revenue) over () is evaluated twice in the same SELECT (once for cohort_total_revenue, once inside pct_of_cohort_revenue). Most optimizers deduplicate this, but factoring it into a separate CTE makes the intent explicit and removes any optimizer dependency.

♻️ Proposed refactor
-final as (
+cohort as (
     select
         product_id,
         product_name,
         product_category,
         avg_rating,
         rating_tier,
         review_count,
         positive_reviews,
         negative_reviews,
         unit_price,
         unit_margin,
         margin_pct,
         total_units_sold,
         product_total_revenue,
         product_gross_profit,
-        sum(product_total_revenue) over ()                                            as cohort_total_revenue,
-        round(
-            product_total_revenue / nullif(sum(product_total_revenue) over (), 0) * 100,
-            2
-        )                                                                             as pct_of_cohort_revenue
+        sum(product_total_revenue) over () as cohort_total_revenue
     from excellent_products
-),
+),
+
+final as (
+    select
+        *,
+        round(
+            product_total_revenue / nullif(cohort_total_revenue, 0) * 100,
+            2
+        ) as pct_of_cohort_revenue
+    from cohort
+)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/finance/finance_excellent_cohort_revenue.sql` around lines
41 - 47, Extract the duplicated window sum into an intermediate CTE so the
expression sum(product_total_revenue) over () is computed once; in practice, add
a CTE (e.g., excellent_products_with_total) that selects product_total_revenue
and computes cohort_total_revenue as sum(product_total_revenue) over (), then in
the final SELECT derive pct_of_cohort_revenue = round(product_total_revenue /
nullif(cohort_total_revenue, 0) * 100, 2); update references to use
excellent_products_with_total (or similarly named CTE) and keep column names
product_total_revenue, cohort_total_revenue, pct_of_cohort_revenue the same.
dbt/models/marts/core/schema.yml (1)

91-97: Add not_null and accepted_values tests to rating_tier.

The column drives the cohort filter in finance_excellent_cohort_revenue (WHERE rating_tier = 'Excellent'). Without data-quality guards, a NULL value (possible when avg_rating is NULL from the left join with reviews) or a misspelled tier silently produces an empty or corrupt cohort. The description already enumerates the four valid tiers — making accepted_values straightforward. Business rules that require columns to use a valid set are a natural fit for dbt's accepted_values test.

🛡️ Proposed tests for rating_tier
       - name: rating_tier
         description: >
           Business-logic classification of product quality derived from avg_rating.
           Tiers are defined as: Excellent (avg_rating >= 4.5), Good (avg_rating >= 3.5),
           Average (avg_rating >= 2.5), and Poor (avg_rating < 2.5). Used as the cohort
           key for rating-based revenue and margin analysis (e.g. finance_excellent_cohort_revenue).
         data_type: varchar
+        tests:
+          - not_null
+          - accepted_values:
+              values: ['Excellent', 'Good', 'Average', 'Poor']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/schema.yml` around lines 91 - 97, Add dbt tests for the
rating_tier column to prevent NULLs and invalid tier values: in the schema.yml
entry for the model that defines rating_tier, add a not_null test for
rating_tier and an accepted_values test listing the four allowed strings
("Excellent", "Good", "Average", "Poor"); this ensures cohorts like
finance_excellent_cohort_revenue (which filters WHERE rating_tier = 'Excellent')
won't silently fail due to NULLs or misspellings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/marts/finance/schema.yml`:
- Around line 250-256: The pct_of_cohort_revenue field in the schema.yml is
missing a not_null test which allows NULLs when cohort_total_revenue is zero;
add the - not_null test under the pct_of_cohort_revenue column entry so dbt will
fail the build if the metric is NULL (this mirrors the existing not_null on
cohort_total_revenue and ensures the calculated column pct_of_cohort_revenue is
validated).
- Around line 161-165: The business_logic test "pct_of_cohort_revenue sums to
100" is using a fixed 0.01 tolerance which is too tight because
pct_of_cohort_revenue is rounded to 2 decimals and rounding error accumulates
with N products; update the test SQL to use a safe tolerance (either a practical
constant like 0.5 or a calculated maximum such as COUNT(DISTINCT
product_id)/200) instead of 0.01 — edit the test named "pct_of_cohort_revenue
sums to 100" in schema.yml and change the WHERE clause so it compares
ABS(SUM(pct_of_cohort_revenue) - 100) against the new threshold (e.g., > 0.5 or
> (COUNT(DISTINCT product_id)/200) computed from {{ model }}).

---

Nitpick comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 91-97: Add dbt tests for the rating_tier column to prevent NULLs
and invalid tier values: in the schema.yml entry for the model that defines
rating_tier, add a not_null test for rating_tier and an accepted_values test
listing the four allowed strings ("Excellent", "Good", "Average", "Poor"); this
ensures cohorts like finance_excellent_cohort_revenue (which filters WHERE
rating_tier = 'Excellent') won't silently fail due to NULLs or misspellings.

In `@dbt/models/marts/finance/finance_excellent_cohort_revenue.sql`:
- Around line 41-47: Extract the duplicated window sum into an intermediate CTE
so the expression sum(product_total_revenue) over () is computed once; in
practice, add a CTE (e.g., excellent_products_with_total) that selects
product_total_revenue and computes cohort_total_revenue as
sum(product_total_revenue) over (), then in the final SELECT derive
pct_of_cohort_revenue = round(product_total_revenue /
nullif(cohort_total_revenue, 0) * 100, 2); update references to use
excellent_products_with_total (or similarly named CTE) and keep column names
product_total_revenue, cohort_total_revenue, pct_of_cohort_revenue the same.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fca229 and 0b62fcf.

📒 Files selected for processing (3)
  • dbt/models/marts/core/schema.yml
  • dbt/models/marts/finance/finance_excellent_cohort_revenue.sql
  • dbt/models/marts/finance/schema.yml

Comment on lines +161 to +165
- name: "pct_of_cohort_revenue sums to 100"
type: "business_logic"
sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.01"
expected_result: "0 rows"
description: "Validates that pct_of_cohort_revenue values sum to 100% across all products, within a 0.01 rounding tolerance."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

pct_of_cohort_revenue sum tolerance of 0.01 will cause systematic false test failures.

pct_of_cohort_revenue is rounded to 2 decimal places. The maximum rounding error per product is ±0.005, so the accumulated error across N products is at most N × 0.005. With just 7 equal-revenue products the sum is already 100.03 (diff = 0.03 > 0.01), triggering a test failure. Any realistic "Excellent" cohort with non-trivially-divisible revenue shares will exceed the 0.01 threshold.

The tolerance should be at least N_products / 200 (i.e., the theoretical maximum accumulation), or a practical constant like 0.5:

🐛 Proposed fix
-          sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.01"
+          sql: "SELECT * FROM (SELECT ABS(SUM(pct_of_cohort_revenue) - 100) AS diff FROM {{ model }}) t WHERE diff > 0.5"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/finance/schema.yml` around lines 161 - 165, The
business_logic test "pct_of_cohort_revenue sums to 100" is using a fixed 0.01
tolerance which is too tight because pct_of_cohort_revenue is rounded to 2
decimals and rounding error accumulates with N products; update the test SQL to
use a safe tolerance (either a practical constant like 0.5 or a calculated
maximum such as COUNT(DISTINCT product_id)/200) instead of 0.01 — edit the test
named "pct_of_cohort_revenue sums to 100" in schema.yml and change the WHERE
clause so it compares ABS(SUM(pct_of_cohort_revenue) - 100) against the new
threshold (e.g., > 0.5 or > (COUNT(DISTINCT product_id)/200) computed from {{
model }}).

Comment on lines +250 to +256
- name: pct_of_cohort_revenue
description: >
Each product's share of the Excellent cohort's total revenue, expressed as a
percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100,
rounded to 2 decimal places. Useful for identifying which Excellent products
are the largest revenue contributors within the cohort.
data_type: numeric
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

pct_of_cohort_revenue is missing a not_null test.

cohort_total_revenue directly above it has - not_null, but pct_of_cohort_revenue does not. The SQL computes it as product_total_revenue / nullif(sum(...) over (), 0) * 100, which produces NULL whenever the cohort revenue sum is zero. A not_null test here surfaces that edge case at test time rather than silently exposing NULLs to consumers.

🛡️ Proposed addition
       - name: pct_of_cohort_revenue
         description: >
           Each product's share of the Excellent cohort's total revenue, expressed as a
           percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100,
           rounded to 2 decimal places. Useful for identifying which Excellent products
           are the largest revenue contributors within the cohort.
         data_type: numeric
+        tests:
+          - not_null
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: pct_of_cohort_revenue
description: >
Each product's share of the Excellent cohort's total revenue, expressed as a
percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100,
rounded to 2 decimal places. Useful for identifying which Excellent products
are the largest revenue contributors within the cohort.
data_type: numeric
- name: pct_of_cohort_revenue
description: >
Each product's share of the Excellent cohort's total revenue, expressed as a
percentage (0–100). Calculated as product_total_revenue / cohort_total_revenue * 100,
rounded to 2 decimal places. Useful for identifying which Excellent products
are the largest revenue contributors within the cohort.
data_type: numeric
tests:
- not_null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/finance/schema.yml` around lines 250 - 256, The
pct_of_cohort_revenue field in the schema.yml is missing a not_null test which
allows NULLs when cohort_total_revenue is zero; add the - not_null test under
the pct_of_cohort_revenue column entry so dbt will fail the build if the metric
is NULL (this mirrors the existing not_null on cohort_total_revenue and ensures
the calculated column pct_of_cohort_revenue is validated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant